Search Result

Select

Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm

Lin SUN, Jinxu HUANG, Jiucheng XU

Journal of Computer Applications 2023, 43 (6): 1842-1854. DOI: 10.11772/j.issn.1001-9081.2022050691

Abstract （190）

HTML （6）

PDF （1713KB）（206）

Save

Aiming at the problems that most feature selection algorithms do not fully consider class non-uniform distribution of data， the correlation between features and the influence of different parameters on the feature selection results， a feature selection method for imbalanced data based on neighborhood tolerance mutual information and Whale Optimization Algorithm （WOA） was proposed. Firstly， for the binary and multi-class datasets in incomplete neighborhood decision system， two kinds of feature importances of imbalanced data were defined on the basis of the upper and lower boundary regions. Then， to fully reflect the decision-making ability of features and the correlation between features， the neighborhood tolerance mutual information was developed. Finally， by integrating the feature importance of imbalanced data and the neighborhood tolerance mutual information， a Feature Selection for Imbalanced Data based on Neighborhood tolerance mutual information （FSIDN） algorithm was designed， where the optimal parameters of feature selection algorithm were obtained by using WOA， and the nonlinear convergence factor and adaptive inertia weight were introduced to improve WOA and avoid WOA from falling into the local optimum. Experiments were conducted on 8 benchmark functions， the results show that the improved WOA has good optimization performance； and the experimental results of feature selection on 13 binary and 4 multi-class imbalanced datasets show that the proposed algorithm can effectively select the feature subsets with good classification effect compared with the other related algorithms.

Table and Figures | Reference | Related Articles | Metrics

Select

Improved AdaBoost algorithm based on base classifier coefficients and diversity

ZHU Liang, XU Hua, CUI Xin

Journal of Computer Applications 2021, 41 (8): 2225-2231. DOI: 10.11772/j.issn.1001-9081.2020101584

Abstract （413）

PDF （1058KB）（466）

Save

Aiming at the low efficiency of linear combination of base classifiers and over-adaptation of the traditional AdaBoost algorithm, an improved algorithm based on coefficients and diversity of base classifiers - WD AdaBoost (AdaBoost based on Weight and Double-fault measure) was proposed. Firstly, according to the error rates of the base classifiers and the distribution status of the sample weights, a new method to solve the base classifier coefficients was given to improve the combination efficiency of the base classifiers. Secondly, the double-fault measure was introduced into WD AdaBoost algorithm in the selection strategy of base classifiers for increasing the diversity among base classifiers. On five datasets of different actual application fields, compared with the traditional AdaBoost algorithm, CeffAda algorithm uses the new base classifier coefficient solution method to make the test error reduced by 1.2 percentage points on average; meanwhile, WD AdaBoost algorithm has the lower error rate compared with WLDF_Ada, AD_Ada (Adaptive to Detection AdaBoost), sk_AdaBoost and other algorithms. Experimental results show that WD AdaBoost algorithm can integrate base classifiers more efficiently, resist overfitting, and improve the classification performance.

Reference | Related Articles | Metrics

Select

Over-sampling algorithm for imbalanced datasets

CUI Xin, XU Hua, SU Chen

Journal of Computer Applications 2020, 40 (6): 1662-1667. DOI: 10.11772/j.issn.1001-9081.2019101817

Abstract （342）

PDF （749KB）（376）

Save

In Synthetic Minority Over-sampling TEchnique (SMOTE), noise samples may participate in the synthesis of new samples, so it is difficult to guarantee the rationality of the new samples. Aiming at this problem, combining clustering algorithm, an improved algorithm called Clustered Synthetic Minority Over-sampling TEchnique (CSMOTE) was proposed. In the algorithm, the idea of the linear interpolation between the nearest neighbors was abandoned, and the linear interpolation between the cluster centers of minority classes and the samples of corresponding clusters was used to synthesize new samples. And the samples involved in the synthesis were screened to reduce the possibility of noise samples participating in the synthesis. On six actual datasets, CSMOTE algorithm was compared with four SMOTE’s improved algorithms and two under-sampling algorithms for many times, and CSMOTE algorithm obtained the highest AUC values on all datasets. Experimental results show that CSMOTE algorithm has higher classification performance and can effectively solve the problem of unbalanced sample distribution in the datasets.

Reference | Related Articles | Metrics

Select

Speaker recognition in strong noise environment based on auditory cortical neuronal receptive field

NIU Xiaoke, HUANG Yixin, XU Huaxing, JIANG Zhenyang

Journal of Computer Applications 2020, 40 (10): 3034-3040. DOI: 10.11772/j.issn.1001-9081.2020020272

Abstract （392）

PDF （1737KB）（543）

Save

Aiming at the problem that speaker recognition is susceptible to environmental noise, a new voiceprint extraction method was proposed based on the spatial-temporal filtering mechanism of Spectra-Temporal Receptive Field (STRF) of biological auditory cortex neurons. In the method, the quadratic characteristics were extracted from the auditory scale-rate map based on STRF, and the traditional Mel-Frequency Cepstral Coefficient (MFCC) was combined to obtain the voiceprint features with strong tolerance to environmental noise. Using Support Vector Machine (SVM) as feature classifier, the testing results on speech data with different Signal-to-Noise Ratios (SNR) showed that the STRF-based features were more robust to noise than MFCC coefficient, but had lower recognition accuracy; the combined features improved the accuracy of speech recognition and had good robustness to noise. The results verify the effectiveness of the proposed method in speaker recognition under strong noise environment.

Reference | Related Articles | Metrics

Select

Diversity analysis and improvement of AdaBoost

WANG Lingdi, XU Hua

Journal of Computer Applications 2018, 38 (3): 650-654. DOI: 10.11772/j.issn.1001-9081.2017092226

Abstract （615）

PDF （925KB）（532）

Save

To solve the problem of how to measure diversity among weak classifiers created by AdaBoost as well as the over-adaptation problem of AdaBoost, an improved AdaBoost method based on double-fault measure was proposed, which was based on the analysis and study of the relationship between four diversity measures and the classification accuracy of AdaBoost. Firstly, Q statistics, correlation coefficient, disagreement measure and double-fault measure were selected for experiment on the data sets from the UCI (University of CaliforniaIrvine Irvine) machine learning repository. Then, the relationship between diversity and ensemble classifier's accuracy was evaluated with Pearson correlation coefficient. The results show that each measure tends to a stable value in the later stage of iteration; especially double-fault measure changes similarly on different data sets, increasing in the early stage and tending to be stable in the later stage of iteration. Finally, a selection strategy of weak classifier based on double-fault measure was put forward. The experimental results show that compared with the other commonly used ensemble methods, the test error of the improved AdaBoost algorithm is reduced by 1.5 percentage points in average, and 4.8 percentage points maximally. Therefore, the proposed algorithm can improve classification performance.

Reference | Related Articles | Metrics

Select

Kernel fuzzy C-means clustering based on improved artificial bee colony algorithm

LIANG Bing, XU Hua

Journal of Computer Applications 2017, 37 (9): 2600-2604. DOI: 10.11772/j.issn.1001-9081.2017.09.2600

Abstract （604）

PDF （801KB）（553）

Save

Aiming at the problem that Kernel-based Fuzzy C-Means (KFCM) algorithm is sensitive to the initial clustering center and is easy to fall into the local optimum, and the fact that Artificial Bee Colony (ABC) algorithm is simple and of high global convergence speed, a new clustering algorithm based on Improved Artificial Bee Colony (IABC) algorithm and KFCM iteration was proposed. Firstly, the optimal solution was obtained by using IABC as the initial clustering center of the KFCM algorithm. IABC algorithm improved the search behavior of the employed bee with the change rate of the difference from the current dimension optimal solution in the iterative process, balancing the global search and local mining ability of the artificial bee colony algorithm. Secondly, based on within-class distance and between-class distance, the fitness function of the KFCM algorithm was constructed and the cluster center was optimized by KFCM algorithm. Finally, the IABC and KFCM algorithms were executed alternately to achieve optimal clustering. Three Benchmark test functions and six sets in UCI standard data set was used to carry out simulation experiments. The experimental results show that IABC-KFCM improves the clustering effectiveness index of data set by 1 to 4 percentage points compared to IABC-GFCM (Generalized Fuzzy Clustering algorithm based on Improved ABC), which has the advantages of strong robustness and high clustering precision.

Reference | Related Articles | Metrics

Select

Improved discrete particle swarm algorithm for solving flexible flow shop scheduling problem

XU Hua, ZHANG Ting

Journal of Computer Applications 2015, 35 (5): 1342-1347. DOI: 10.11772/j.issn.1001-9081.2015.05.1342

Abstract （520）

PDF （963KB）（595）

Save

An improved Discrete Particle Swarm Optimization (DPSO) algorithm was proposed for solving the Flexible Flow Shop scheduling Problem (FFSP) with makespan criterion. The proposed algorithm redefined the operator of particle's velocity and position, and the encoding matrix and decoding matrix were introduced to represent the relationship between job, machine and scheduling. To improve the quality of initial population of the improved DPSO algorithm for the FFSP solution, by analyzing the relationship between the initial machine selection and the total completion time, a shortest time decomposition strategy based on NEH algorithm was proposed. The experimental results show that the algorithm has good performance in solving the flexible flow shop scheduling problem, and it is an effective scheduling algorithm.

Reference | Related Articles | Metrics

Select

Dandelion: Rapid deployment mechanism of cloud platform based on OpenStack

LI Liyao, ZHAO Shaoka, WANG Ye, YANG Jiahai, XU Huarong

Journal of Computer Applications 2015, 35 (11): 3070-3074. DOI: 10.11772/j.issn.1001-9081.2015.11.3070

Abstract （368）

PDF （742KB）（651）

Save

A rapid and automatic deployment solution of cloud platform based on OpenStack was presented in order to improve OpenStack deployment efficiency. Firstly, the solution created image template files of different node types, and then replicated the image template by node types (such as network node, computing node), and according to the properties of the nodes (such as IP address, hostname tag), automatically modified the configuration file in the use of scripts in order to complete single node deployment. Then, the same strategy was used to achieve rapid deployment of other nodes. After that, the solution took advantage of network service (PXE(Preboot eXecute Environment), DHCP (Dynamic Host Configuration Protocol) and TFTP (Trivial File Transfer Protocol)) which were provided by management servers, mounted the image-block-file. Finally, nodes were started up to complete Dandelion. In addition, performance evaluation model was established to determine the optimal number of image copies and storage servers in order to optimize the storage network topology. Compared with other deployment schemes,such as Cobbler, NFS (Network File System), whether using the same size storage network to deploy different size cloud platforms, or using different size storage network to deploy the same size cloud platform, the experimental results show that the proposed solution can greatly reduce deployment time and improve efficiency of the deployment.

Reference | Related Articles | Metrics

Select

Trust model based on node dependency and interaction frequency in wireless Mesh network

SONG Xiaoyu, XU Huan, BAI Qingyue

Journal of Computer Applications 2015, 35 (11): 3051-3054. DOI: 10.11772/j.issn.1001-9081.2015.11.3051

Abstract （438）

PDF （566KB）（542）

Save

The openness and dynamics of Wireless Mesh Network (WMN) makes it used widely, however, there are some security problems. The traditional trust model could no longer meet the security requirements of WMN. Based on the trust principle of social network, a new trust model named TFTrust was proposed. In the TFTrust, multi-dimensional factor calculation method was defined, including node contribution value, node dependency value and interaction frequency value, also the calculation method of the direct trust value was established. The simulation results show that TFTrust model is better than Ad Hoc on-demand Distance Vector Routing (AODV) protocol and Beth model in safety, quality of service and reduces the cost of network communications, etc.

Reference | Related Articles | Metrics

Select

Range-based localization algorithm with virtual force in wireless sensor and actor network

WANG Haoyun WANG Ke LI Duo ZHANG Maolin XU Huanliang

Journal of Computer Applications 2014, 34 (10): 2777-2781. DOI: 10.11772/j.issn.1001-9081.2014.10.2777

Abstract （257）

PDF （912KB）（334）

Save

To solve the sensor node localization problem of Wireless Sensor and Actor Network (WSAN), a range-based localization algorithm with virtual force in WSAN was proposed in this paper, in which mobile actor nodes were used instead of Wireless Sensor Network (WSN) anchors for localization algorithm, and Time Of Arrival (TOA) was combined with virtual force. In this algorithm, the actor nodes were driven under the action of virtual force and made themself move close to the sensor node which sent location request, and node localization was completed by the calculation of the distance between nodes according to the signal transmission time. The simulation results show that the localization success rate of the proposed algorithm can be improved by 20% and the average localization time and cost are less than the traditional TOA algorithm. It can apply to real-time field with small number of actor nodes.

Reference | Related Articles | Metrics

Select

Grid service discovery algorithm based on attribute weight and rough set

ZHAO Xu HUANG Yong-zhong AN Liu-yang

Journal of Computer Applications 2012, 32 (01): 167-169. DOI: 10.3724/SP.J.1087.2012.00167

Abstract （951）

PDF （440KB）（624）

Save

To solve the low efficiency problem of grid service discovery, based on ontology technology, the theory of decision table, and knowledge representation system of rough sets, the paper put forward an optimized service discovery algorithm that considered the weight of the service properties. By rule extraction of the service invocation history and the calculation of the service properties weight, two main phases of the service discovery algorithm: information pre-processing and rough set service matching could be achieved. This article also gave theoretical analysis and experimental verification on both precision rate and recall rate. The results show that the proposed algorithm can provide higher precision and recall rate; besides, the ranking results of the candidate services are more preferable.

Reference | Related Articles | Metrics

Select

Decision tree optimization algorithm based on multiscale rough set model

CHEN Jia-jun SU Sjou-bao XU Hua-li

Journal of Computer Applications 2011, 31 (12): 3243-3246.

Abstract （1244）

PDF （587KB）（728）

Save

Concerning the complicated structure and being sensitive to noise and other problems of decision tree constructed by classical decision tree algorithms, a new decision tree construction algorithm based on multiscale rough set model was proposed. This proposed algorithm introduced the concept of scale variable and scale function, the index of approximate classification accuracy in different scales was used to select test attributes and the holddown factor was put forward to prune the decision tree and removed the noise rules effectively. The results show that decision tree constructed by this algorithm is simple, and has certain degree of antiinterference and can meet the decision accuracy requirements from different users.